Student Name : Joseph Hencil Peter

NRIC : S7967093F

Data Wrangling Assigment

Assignment - Data Wrangling using the Covid-19 Dataset (Updated)

The Coronavirus disease 2019 (COVID-19), formerly known as 2019-nCoV acute respiratory disease, is an infectious disease caused by SARS-CoV-2, a virus closely related to the SARS virus. The disease is the cause of the 2019–20 coronavirus outbreak. It is primarily spread between people via respiratory droplets from infected individuals when they cough or sneeze. Time from exposure to onset of symptoms is generally between 2 and 14 days. Spread can be limited by handwashing and other hygiene measures.*

In this assignment, we will use a Novel Corona Virus 2019 Dataset that is extracted from Kaggle, to perform some data wrangling exercises.

This dataset consists of three files:

Each of these files contains the number of confirmed cases, as well as the number of recoveries and deaths that resulted from the COVID-19 disease. The areas affected are classified according to provinces/states, as well as countries/regions.

Take a minute to explore the various columns and rows in the dataset

Import the Packages

Setting the Pandas Print Option

Load the CSV Files

View the DataFrames

Observe the various rows and columns of the loaded dataframes

Extract the names of columns containing dates and times

For each dataframe, extract the names of all the columns that contains date and time. Eg., 1/21/20 22:00, 1/22/20 12:00, etc.

7.JPG

Unpivot a DataFrame from wide format to long format

Unpivot the dataframes so that the dates are no longer represented as columns. Rather, the dates should be stored as values under a column, say Date. The number of cases (Confirmed, Deaths, and Recovered) should be saved as a corresponding column, say Confirmed, Recovered, and Deaths

View the unpivoted DataFrames

Display all the unpivoted dataframes

Combine all the unpivoted dataframes into one single dataframe

Combine all the various figures for Confirmed, Recovered, and Deaths into a single dataframe

12.JPG

Replace all NAs with 0s

For all the empty cells in the dataframe, replace with 0

13.JPG

Change the Date column to Datetime format

Observe that the Date column contains both date and time. Eg. 1/25/20 12:00. Some cases are reported a few times a day. For this, it is useful to:

a1.JPG

Display the daily number of confirmed, recovered, and deaths

Display the daily numbers of Confirmed, Recovered, and Deaths cases.

15.JPG

Display the total daily number of confirmed, recovered, and deaths for each country

Display the daily numbers of Confirmed, Recovered, and Deaths cases. For this, we are only interested in the total numbers for each country.

16.JPG

Display the data for the most recent day

Show the data for each country for the most recent day. Screenshot%202020-04-09%20at%201.53.57%20PM.png

Top 10 countries with confirmed cases

Display the top 10 countries with confirmed cases. Screenshot%202020-04-09%20at%201.54.40%20PM.png

The End